ApplnNo. 10/688,041 
Amendment dated November 21, 2007 
Reply to Office Action of September 2 1 , 2007 
Docket No. BOC9-2003-0021 (390) 

Amendments to the Claims; 

This listing of claims will replace all prior versions and listings of claims in the instant 
application: 

Listing of Claims; 

1 . (Currently Amended) A computer-implemented method for debugging and tuning 
synthesized audio, comprising the steps of: 

(a) receiving a user-supplied text with a visual user interface; 

(b) generating synthesized audio generated from concatenated phonetic units, the 
synthesized audio being a voice rendering of the user-supplied text; 

(c) displaying a waveforai coiTesponding to the synthesized audio generated from 
concatenated phonetic units; 

(d) displaying parameters corresponding to at least one of the phonetic units, the 
parameters including configuration parameters comprising a phonetic alignment marker, 
a phonetic unit label electronic ally g e nerated without ad dit ional user input, and a pitch 
maris comprising at least one weight for adjusting at least one search cost function, the at 
least one weight comprising at least one of a pitch cost weight and a duration cost weight : 

(e) displaying an original recording containing a selected phonetic unit; 

(f) receiving an editing input from the user; 

(g) adjusting at least one configuration parameter in accordance with the editing 
input and storing the at least one configuration parameter in a text-to-speech engine 
configuration file, wherein adjusting includes repositioning [[the]] a phonetic alignment 

marker; 

(h) highlighting in the display of the original recording at least one user-selected 
phonetic unit; 

2 



{WP449920;1} 



Appln No. 10/688,041 
Amendment dated November 21, 2007 
Reply to Office Action of September 21, 2007 
Docket No. BOC9-2003-0021 (390) 

(i) correcting elements of a text-to-speech segment dataset of parameters 
corresponding to a segment of the synthesized audio identified as be problematic; 

(j) generating a new synthesized wavefonn corresponding to one or more adjusted 
parameters; and 

(k) repeating steps (b)-(j) until a desired synthesized output is generated. 

2. (Currently Amended) The method of claim 1, wherein said displaying parameters 
step further comprises automatically displaying the parameters responsive to a user 
selection of at least a portion of the waveform, the displayed parameters correlating to the 
selected portion of the waveform. 

3 . (Original) The method of claim 1 , wherein said displaying parameters step further 
comprises identifying a portion of the waveform responsive to a user selection of at least 
one of the parameters, the identified portion of the waveform correlatmg to the selected 
parameters. 

4. (Cancelled) 

5. (Cancelled) 

6. (Cancelled) 

7. (Cancelled) 

8. (Currently Amended) The method of claim 1 , wherein said adjusting step 
comprises at least one action selected from the group consisting of deleting a pitch mark, 

3 



{WP449920;1} 



ApplnNo. 10/688,041 
Amendment dated November 21, 2007 
Reply to OfiBce Action of September 21, 2007 
Docket No. BOC9-2003-0021 (390) 

inserting a pitch mark, and repositioning a pitch mark by deleting a phonetic unit label, 
adding a phonetic unit label, modifying the phonetic unit label, and repositioning the 
phonetic unit boundaries. 

9. (Currently Amended) The method of claim 1 , wherein said automatically 
displaying parameters step further comprises the step of displaying a waveform from the 
original recording along with the phonetic unit. 

10. (Original) The method of claim 9, wherein edits to the waveform adjust 
parameters in the segment dataset. 

11. (Cancelled) 

12. (Cancelled) 

13. (Currently Amended) A machine-readable storage having stored thereon a 
computer program having a plurality of code sections, the code sections executable by a 
machine for causing the machine to perfomi the steps of: 

(a) receiving a user-supplied text with a visual user interface; 

(b) generating synthesized audio generated from concatenated phonetic units, the 
synthesized audio being a voice rendering of the user-supplied text; 

(c) displaying a waveform coiTesponding to the synthesized audio generated from 
concatenated phonetic units; 

(d) displaying parameters corresponding to at least one of the phonetic units, the 
parameters including configuration parameters comprising a phonetic alignment marker, 
a phonetic unit label electronically generated without additional user input, and a pitch 
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fflorig comprising at least one weight for admsting at least one search cost ftmction. the at 
least one weight comprising at least one of a pitch cost weight and a duration cost weight : 

(e) displaying an original recording containing a selected phonetic unit; 

(f) receiving an editing input from the user; 

(g) adjusting at least one configuration parameter in accordance with the editing 
input and storing the at least one configuration parameter in a text-to-speech engine 

configuration file, wherein adjusting includes repositioning [[the]] a phonetic alignment 
marker; 

(h) highlighting in the display of the original recording at least one user-selected 
phonetic unit; 

(i) connecting elements of a text-to-speech segment dataset of parameters 
corresponding to a segment of the synthesized audio identified as be problematic; 

(j) generating a new synthesized waveform corresponding to one or more adjusted 
parameters; and 

(k) repeating steps (b)-(j) until a desired synthesized output is generated. 

14. (Currently Amended) The machine-readable storage of claim 13, wherein said 
displaying parameters step further comprises automatically displaying the parameters 

responsive to a user selection of at least a portion of the wavefonn, the displayed 
parameters correlating to the selected portion of the wavefonii. 

15. (Original) The machine-readable storage of claim 13, wherein said displaying 
parameters step further comprises identifying a portion of the waveform responsive to a 
user selection of at least one of the parameters, the identified portion of the waveform 
correlating to the selected parameters. 
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16. (Cancelled) 

17. (Cancelled) 

18. (Cancelled) 

19. (Cancelled) 

20. (Currently Amended) The machine-readable storage of claim 13, wherein said 
adjusting step comprises at least one action selected from the group consisting of deleting 
a pitch mark, inserting a pitch mark, and repositioning a pitch mark by deleting a 
phonetic unit label, adding a phonetic unit label, modifying the phonetic unit label, and 
repositioning the phonetic unit boundaries. 

21 . (Currently Amended) The machine-readable storage of claim 13, wherein said 
automatically displaying parameters step further comprises the step of displaying a 
waveform from the original recording along with the phonetic unit. 

22. (Original) The machine-readable storage of claim 21, wherein edits to the 
wavefonn adjust parameters in the segment dataset. 

23. (Cancelled) 

24. (Cancelled) 
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25. (Cxirrently Amended) A system for debugging and tuning synthesized audio, 
comprising: 

means for receiving a user-supplied text with a visual user interface; 

means for generating synthesized audio generated from concatenated phonetic 
units, the synthesized audio being a voice rendering of the user-supplied text; 

means for displaying the waveform corresponding to synthesized audio generated 
from concatenated phonetic units; 

means for displaying parameters corresponding to at least one of the phonetic 
units, the parameters including configuration parameters comprising a phonetic aligmnent 
marker, a phonetic unit label electronically generated without additional user input, and a 
pitch mark comprising at least one weight for adjusting at least one search cost function, 
the at least one weight comprising at least one of a pitch cost weight and a duration cost 
weight : 

means for displaving an original recording containing a selected phonetic unit: 

means for receiving an editing input from the user; and means for adjusting the 
parameters in accordance with the editing input by adjusting and storing in a text-to- 
speech engine configuration file at least one configuration parameter, wherein adjusting 
includes repositioning [[the]] a phonetic alignment marker[[.]] . 

means for highlighting in the display of the original recording at least one user- 
selected phonetic unit; 

means for correcting elements of a text-to-speech segment dataset of parameters 
corresponding to a segment of the synthesized audio identified as be problematic; 

means for generating a new synthesized waveform corresponding to one or more 
adjusted parameters; and 

wherein the system continues to regenerate new synthesized waveforms until a 
desired synthesized output is generated. 
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26. (Previously Presented) The method of Claim 1 wherein the parameter updates and 
segment dataset corrections are applied in regenerating the synthesized audio. 
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